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Abstract: The probabifity that an advantageous mutant rises to fixation in 

a viral quasispecies is investigated in the framework of multi-type branching 
processes. Whether fixation is possible depends on the overall growth rate 
of the quasispecies that will form if invasion is successful, rather than on the 
individual fitness of the invading mutant. The exact fixation probability can 
only be calculated if the fitnesses of all potential members of the invading 
quasispecies are known. Quasispecies fixation has two important charac- 
teristics: First, a sequence with negative selection coefficient has a positive 
fixation probability as long as it has the potential to grow into a quasis- 
pecies with an overall growth rate that exceeds the one of the estabfished 
quasispecies. Second, the fixation probabilities of sequences with identical 
fitnesses can nevertheless vary over many orders of magnitudes. Two approx- 
imations for the probability of fixation are introduced. Both approximations 
require only partial knowledge about the potential members of the invading 
quasispecies. The performance of these two approximations is compared to 
the exact fixation probability on a network of RNA sequences with identical 
secondary structure. 
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INTRODUCTION 



One of the most remarkable aspects of the dynamics of RNA viruses is the 
high rate at which mutant variants are produced. At mutation rates close to 
one substitution per genome per generation ( [Drake 1993| ; [Drake and Holland 
1999| ), a virus population forms a highly diverse cloud of mutants ([Domingo 



et al. 1976 ; pomingo et al. 1978 ; Holland et al. 1982 ; Steinhauer et al. 1989 ; 



Biebricher and Luce 1993| ; [Burch and Chao 2000| ), a so-called quasispecies 
( [Eigen and Schuster 1979[ ; [Nowak 1992| ; pomingo and Holland 1997[ ; [Domingo 
et al. 200 1| ). At the same time, the sequence space is so large that even for 
population sizes up to 10^^, there is a constant stream of new mutants that 
have never existed before. Most of these mutants have impaired fitness, 
but occasionally, a new mutant will fare better than all currently existing 
virions, for example by presenting an epitope that the immune system fails to 
recognize. With a certain probability, this mutant will rise to fixation, where 
fixation is understood in the sense that the mutant becomes the ancestor of 
a new quasispecies which completely replaces the currently existing one. 

The problem of the fixation of an advantageous mutant is an old one, 
with a long history of investigations in classical population genetics, reach- 
ing back to Haldane and Fisher ( [b'isher 1922| ; [b'isher 1930| ; [Haldane 1927| ; 
Kimura 1957| ; [Kimura 1964| ; [Kimura 1970| ; [Kimura and King 1979| ; [Ewens 
1967]; [Biirger and Ewens 1995]; [Barton 1995|; [Otto and Barton 1997|; [PoUak 



2000| ). However, these investigations differ from the quasispecies case in one 
important aspect: the mutation rates considered. In classical population 
genetics, the usual assumption is that mutations are rare events, such that 
an invading mutant will not mutate again while it is moving towards either 
fixation or extinction. In the quasispecies setting, on the other hand, most 
of the immediate offspring of a mutant will have further mutations, and their 
offspring as well, and so on. As a consequence, the fitness of a prospective 
invading quasispecies is not given by the fitness of the initial mutant, but 
rather by the average fitness of the offspring mutant cloud that will form 
eventually. One of the more surprising results of this dynamics is that a 
mutant with the ability to replace the currently existing quasispecies may 
actually have a reduced replication rate, if at the same time its robustness 
against further mutations is increased ( Schuster and Swetina 1988| ; [Wilke 
et al. 20011 ; [Wilke 2001 b| ; [Krakauer and Plotkin 2002|) . 



Quasispecies theory in its original formulation by Eigen and Schuster 
( [1979[ ) is based on deterministic differential equations, and as such cannot 
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deal with the fluctuations that are responsible for fixation or extinction of 
individual mutants. Within the more general mathematical framework of 
multi-type branching processes, it is possible to describe both the determin- 
istic aspects of large populations as well as the fluctuations inherent in the 
dynamics of small and very small populations ( pemetrius et al. 1985| ; pof- 



bauer and Sigmund 1988 ; Hermisson et al. 2002|) . An expression for the 



probability of fixation follows naturally from branching process theory. We 
will discuss how this expression relates to the predictions of the determin- 
istic quasispecies equations, as well as to the results of classical population 
genetics. 

The remainder of the paper is organized as follows. First, we derive 
a general expression for the probability of fixation in an arbitrary fitness 
landscape. Then, we discuss the special case of fixation on a neutral network, 
that is, the case in which all sequences of the invading quasispecies have the 
same fitness, and derive two approximations for the fixation probability that 
can be evaluated without the knowledge of the full fitness landscape. In 
order to give a concrete example, we apply both the exact expression and 
the approximations to a known network of over 50,000 RNA sequences. For 
this neutral network, we also discuss how the fixation probability changes if 
multiple sequences invade at the same time. 

THEORETICAL FRAMEWORK 

For a population evolving under high mutational pressure, we have to un- 
derstand fixation in the sense that a mutant is fixed once it has become a 
common ancestor of the whole population. The more traditional definition 
of fixation, which is to regard a mutation as fixed if all sequences in the 
population carry it, is not applicable: The mutational pressure constantly 
creates new deleterious mutants, which may not carry a particular mutation 
although their ancestors did so. If we understand fixation as the process by 
which a mutant becomes a common ancestor of the whole population, then 
the probability that a mutant is fixed is given by the probability that the 
cascade of further mutated offspring of the invading mutant does not come 
to a halt. We can calculate this probability from the theory of multi-type 
branching processes. 

The general setting to which our theory applies is as follows. Consider a 
viral quasispecies in mutation-selection balance, with an average fitness (w). 
If generations are discrete and non-overlapping, and the population size N 
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is constant, then the probabihty that a virion i produces k offspring in one 
generation is given by Wright-Fisher samphng, 



Pik\^)=(^^^yKl-^^r-', (1) 

with C,i = Wi/{{w)N), where Wi is the fitness of virion i. 

Assume that a rare mutation leads to the emergence of a virion with the 
potential to form a new quasispecies, and to replace the already established 
one in the process. This new quasispecies (in the following also called the 
invading quasispecies) may consist of sequences of type 1,2, ... ,n, with repli- 
cation rates Wi. Let the probability that a sequence j produces an erroneous 
copy i be given by Qij. As long as the total abundance of the invading quasis- 
pecies is small compared to the established quasispecies, we can assume that 
{w) is not affected by the presence of the invading quasispecies. Then, the 
probability that a single sequence of type i generates (/ci, . . . , fc„) offspring of 
types 1, . . . , n can be expressed as (see Appendix) 



P{ki,...,kn\i) 



{N-ErkrV-Urkrl 

xl[iM,r/N)''^(l-J2M,r/N\ , (2) 

1 1 \ r. — 1 / 



with Mij = WiQji/{w). The matrix elements Mij give the expected number 
of offspring of type j from sequences of type i in one generation. In the 
following, we will assume that the population size is so large that we can 
approximate P{ki, . . . , kn\i) by its limit for an infinitely large population. 
This limit is a multivariate Poisson distribution: 



P(A:i,...,fcJ^) = n(^Mf;^ 



e-^r^'-. (3) 



By using the theory of branching processes, and by assuming an infinite 
population size in Eq. (^, we restrict the applicability of our theory to certain 
scenarios. We can apply our theory only to those types of fixation events 
that increase the average fitness of the population. The situation of genetic 
drift, whereby a neutral or deleterious mutant is fixed because of stochastic 
fluctuations in a small population (|Kimura 1970| ; [Kimura and King 1979|) , 
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is not covered by our theory. This latter type of fixation events reduces the 
average fitness or leaves it unaltered. 

Let Xi be the probability that the offspring cascade spawned by a se- 
quence i goes extinct after a finite number of generations. From the the- 
ory of multitype branching processes ( [Harris 19631) , we know that the vec- 



tor of extinction probabilities ) satisfies x = f{x), where 

f{z) = (/i(2), . . . , fn{z)) is the probability-generating function of the distri- 
bution of offspring probabihties P{ki, . . . , The probability-generating 
function is defined as 

Mz)= m,•••,fcnN)^^...^^. (4) 



After inserting Eq. (||) into Eq. (g), we obtain fi{z) = e^'-^^^'-^^'' ^\ With 
the convention = (e^'^, . . . , e^"), we can rewrite this expression as 

fiz) = eM(^-i) . (5) 

Since the probability of fixation vTj of a sequence i is given by the probability 
that the offspring cascade spawned by i does not go extinct, we have vTj = 
1 — Xi. The vector of fixation probabilities satisfies therefore 1 — tt = /(I — tt). 
With Eq. (D, we find 

1 - 77 = e-M^ . (6) 

This equation has exactly one solution with < tt; < 1 for all i if the spectral 
radiusQ pm of M is larger than one ( |Harris 1963| ). Otherwise, TTj = for all 



I. 

In order to compare Eq. (^) to the result of Haldane (|1927|) , we take the 



logarithm on both sides of Eq. (|^) and expand to second order: 

log(l-7ri) ^-7ri-7r,V2 = -^Mifc7rfc. (7) 

k 

With Si = Ma — 1, this simplifies to 



TT,; 



Si + Js^ + 2j2Mik7^k- (8) 

k^i 



^For the matrices M we are considering here, the spectral radius coincides with the 
largest positive eigenvalue of M, by virtue of the Frobenius-Perron theorem. 
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If Sj > and all off-diagonal elements of M are zero, then Eq. reduces 
to Haldane's result tTj = 2s i, that is, the fixation probability of a sequence 
is twice its selective advantage. If the off-diagonal elements are non-zero, 
then the fixation probability is increased, because the invading sequence gets 
support from its mutational neighbors. In particular, even if some Sj < 0, 
the corresponding tTj are positive as long as pm > 1- This means that in 
quasispecies fixation, sequences that by themselves reproduce too slowly to 
outcompete the currently established quasispecies can nevertheless found a 
new quasispecies that grows fast enough to overtake the population. 

For simplicity, we have considered only discrete, non-overlapping gener- 
ations. Generalization to continuous time is straightforward, see e.g. (|Her 



misson et al. 2(]n^ ; [Harris 19631 ). In the continuous time case, the vector 



of fixation probabilities tt is again determined by an equation of the form 
1 — TT = /(I — tt). However, the generating function f{z) is in general not 
given by Eq. (^). Its functional form depends on the details of the continu- 
ous time process that is being modeled. For example, if reproduction occurs 
through binary fission, f{z) will be quadratic in the variables zi, . . . Zn- 

FIXATION ON A NEUTRAL NETWORK 

Exact expressions and estimates 

So far, we have made no assumptions about the structure and fitness dis- 
tribution of the invading quasispecies. This has led to a general equation 
for the vector of fixation probabilities tt, but not much further analysis is 
possible without a concrete model for the fitness landscape of the invading 
quasispecies (we do not have to make any further assumptions about the es- 
tablished quasispecies, since it enters the equations only through its average 
fitness (w)). The concrete fitness landscape we study is that of a neutral net- 
work QHuynen et al. 199^ ; Pornberg-Bauer 19971 ) of related sequences with 



identical replication rate a. All sequences that are not part of the neutral 
network are assumed to have a vanishing replication rate. Mutations occur 
as random substitutions of single bases, and we allow for at most one sub- 
stitution per replication event, similar to the approach of van Nimwegen et 
al. (|1999|) . The probability of a substitution is given by fi. The restriction 



to at most a single substitution is a technicality that simplifies the analysis. 
Generalization to more elaborate mutation schemes is possible along the lines 
of ( IWilke 2001a]) . 



We denote the sequence length by L, and the number of different bases 
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by K (k = 4 for RNA/DNA). For the matrix M, we only have to take 
into account the sequences belonging to the neutral network. It is useful to 
introduce the connection graph G = [Gij). The elements Gij are 1 if and 
only if two sequences i and j are exactly one mutation apart. In all other 
cases, Gij = 0. We can express M in terms of G as 

M= (s + l)l + /3G, (9) 

where s = a{l — fi)/{w) — 1, /? = afi/[{w)L{K — 1)], and 1 is the identity 
matrix. We restrict our analysis to primitive connection graphs, in which 
case the spectral radius pc of G is given by the unique positive eigenvalue of 
largest modulus of G ([Varga 2000|) . (Irreducibility, which is often assumed 



in similar contexts, is not sufficient, since complex eigenvalues of modulus 
pG may exist if G is not primitive. Irreducible undirected connection graphs 
of the kind we are considering here are primitive if they contain at least one 
cycle of odd length.) 

The spectral radius of M is given in terms of the spectral radius of the 
connection graph pc as 

PM = S + l+(3pG. (10) 

This implies that fixation can occur as long as s is not smaller than —f3pG- 
In an experimental setting, we cannot expect to have knowledge of the 
complete connection graph G. Therefore, it is important to have approxima- 
tions for the fixation probability tTj. We consider two alternative methods. 
Both are based on replacing the matrix M in Eq. (^) by a suitable diagonal 
matrix. This replacement leads to a decoupling of the equations for different 

71 i. 

The quantity that is easiest to obtain experimentally is the growth rate 
of the invading quasispecies relative to the established quasispecies, when 
initially both are present in large and equal amounts. From the definition of 
M, we see that this relative growth rate corresponds to the spectral radius pm 
of M. If we assume that every mutant present in the invading quasispecies 
has an expectation of pm offspring per generation, then we can replace M 
in Eq. (|^) with a matrix that has entries pm on the diagonal, while all off- 
diagonal elements are zero. Then, Eq. simplifies to 1— vTj = e"''"'^' for all i. 
Clearly, this approximation will overestimate the tTj for some mutants (mostly 
those that produce on average less than pm offspring) and underestimate it for 
others (mostly those that produce on average more than pM offspring). In the 
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following, we will refer to this estimate as the deterministic growth estimate, 
because it is based on the assumption that the invading quasispecies grows 
according to the deterministic equations from the outset. 

The alternative method of estimating tTj is as follows. It is reasonable to 
assume that the first couple of replication cycles mostly determine fixation 
or extinction for an invading sequence. During these initial generations, the 
subpopulation descending from the invading sequence cannot explore the full 
neutral network if the network is large. Therefore, the major contribution 
to the fixation probability comes from the connection matrix of the local 
genetic neighborhood of the invading sequence, and sequences further away 
on the neutral network are relatively unimportant. The idea behind the sec- 
ond approximation is therefore to calculate the fixation probability based 
on a small area of genotype space surrounding the invading sequence. In 
the simplest case, we consider only the invading sequence and its immedi- 
ate mutational neighbors. Assume sequence i has z/j neutral neighbors, i.e., 
J2j Gij = Vi- Then the total expected number of offspring of sequence i is 
'Y^. Mij = s + 1 + jSui. Under the assumption that all offspring of i have the 
same expected number of further offspring, the probability of fixation satis- 
fies the equation 1 — vTj = Q-i^+'^+l^'^^hi^ w^e call the solution to this equation 
the neutrality estimate. As in the case of the deterministic growth estimate, 
it will overestimate the true fixation probability for some sequences, and 
underestimate it for others. 

Fixation on a RNA neutral network 

We compared the two estimates to the exact fixation probabilities on a neu- 
tral network of RNA sequences. The network of 51,028 sequences of length 
L = 18 was found through exhaustive enumeration by van Nimwegen et al. 
(|1999|) . The spectral radius of the network's connection graph is pc = 15.7. 
In order to calculate fixation probabilities on this neutral network, we have 
to make an assumption about the average fitness (w) of the established qua- 
sispecies. We assume (w) = 1 — — pg/{SL)], in which case the relative 
growth rate of the invading quasispecies (at macroscopic concentration) with 
respect to the established quasispecies follows from Eq. ( |T0|) as pm = cr, 
independent of the mutation rate. 

Figure |I| displays the exact fixation probabilities (obtained numerically 
from Eq. and the two estimates as functions of the mutation rate. We have 
shown the average fixation probability vf = ^TTj/ra, the minimum probabil- 
ity TTjnin = niinj{7rj}, and the maximum probability VTmax = niaxj{7rj}. Since 
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we chose {w) such that pm is independent of /i, the deterministic growth 
estimate is independent of fi. We observe that the deterministic growth es- 
timate hes consistenly above the average vf, but below the maximum vTmax- 
The neutrahty estimate underestimates the smallest fixation probabilities 
and overestimates the largest ones. Its average lies slightly below vf for small 
mutation rates, and above vf for large mutation rates. A more detailed plot 
of the fixation probabilities at a fixed mutation rate of fi = 0.5 is given 
in Fig. 0. There, we display the fixation probability versus the neutrality 
(number of neutral neighbors) of the invading sequence. The spread in the 
fixation probabilities is remarkable. For sequences with a given neutrality, 
the fixation probabilities vary over up to seven orders of magnitude. This 
demonstrates the important influence of not only the nearest neighbors, but 
also the wider genetic neighborhood on the fate of a single sequence in qua- 
sispecies evolution. The neutrality estimate substantially underestimates the 
fixation probabilities of those sequences that have only few immediate neu- 
tral neighbors, but are otherwise located in a region of the genotype space 
where the density of neutral sequences is high. In principle, we could im- 
prove the neutrality estimate by taking into account all neutral sequences up 
to some distance d, but in practice this method becomes quickly as unwieldy 
as calculating the exact fixation probabilities. 

Multiple invading sequences 

The above considerations address only the case of a single invading sequence. 
The generalization to more than one invading sequence is straightforward. 
Assume that a set iS of sequences, with S = {ii, . . . ,iN}, invades an 
established quasispecies. The probability that this invasion is successful is 
given by 1 — nje5(-'^ ~ ^»)' where TTj are the fixation probabilities of the 
individual sequences. The probability of successful invasion of A^ sequences 
can be used as an indicator for the population size at which the deterministic 
quasispecies equations capture the relevant dynamics of a finite population. 
The fluctuations distinguishing the stochastic process of a finite population 
from the deterministic description can be neglected if the invasion probability 
is close to one. In Fig. the fixation probability on the same neutral network 
of RNA sequences that we have used before is displayed against the size of 
the invading population. The individual data points are averaged over 1000 
independent trials, where for each trial the A^ starting sequences were chosen 
at random. As before, (w) is chosen such that a is the average number of 
offspring of the invading quasispecies in the deterministic limit. 
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Figure ^ shows that the population need not cover the relevant sequence 
space in order to behave as predicted by the deterministic equations. On a 
neutral network of over 50,000 sequences, a population of about 1000 behaves 
deterministically at an advantage in growth rate of only 1%. It is important 
to note that this advantage has been calculated under the assumption of an 
infinite population, and that sufficiently small populations will grow substan- 
tially slower ( |van Nimwegen et al. 199"9|) . Apparently, here a population that 
covers only 2% of the neutral network is not sufficiently small to experience 
this reduction in growth rate. 

DISCUSSION 



The exact expression for the probability of fixation in the quasispecies con- 
text is easy to evaluate numerically if the fitnesses of all relevant sequences 
are known. However, this data is normally not available for experimental 
systems, and approximations have to be used. What is most easily avail- 
able experimentally is the relative rate of growth of the two quasispecies at 
macroscopic concentrations, which is the basis of the deterministic growth es- 
timate. Since this estimate gives only a single number, independently of the 
sequence actually seeding the invading quasispecies, it does not refiect local 
variations in the density of viable sequences around the invading sequence. 
The neutrality estimate does not suffer from this shortcoming. However, it 
requires the knowledge of the fitnesses of the immediate neighbors of the 
invading sequence. Although experimentally tedious, these fitnesses can be 
measured in principle. For example, [li)lena and Lenski (1997)| generated 225 
mutant strains of the bacterium Escherichia coli (each mutant differed from 
the wild type by one, two, or three mutations), and measured the realtive 
fitnesses of the mutant strains to the wild type. The mutant neighborhood 
of an RNA virus can conceivably be measured in a similar manner. 

The predictive power of both the deterministic growth estimate and the 
neutrality estimate depends strongly on the distribution of neutral sequences 
in sequence space. For example, both estimates become exact for the case of a 
uniform neutral lattice, in which all sequences have exactly the same neutral- 
ity. Furthermore, we expect the neutrality estimate to perform particularly 
well in networks in which a sequence's neutrality is strongly correlated to 
the neutralities of its immediate and more distant neutral neighbors. The 
deterministic growth estimate, on the other hand, will yield best results if the 
neutral network does not decompose into areas that are substantially more 
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densely or less densely connected than other areas. However, to what ex- 
tent these conditions are met in natural systems is questionable. As we have 
seen in the present paper, the connection graph of a comparatively simple 
neutral network — consisting of RNA sequences that are only eighteen base 
pairs long — is already so heterogeneous that both estimates fail to give an 
accurate prediction of the fixation probability for a substantial fraction of 
sequences on that network. It is reasonable to assume that the distribution 
of high-fitness sequences in sequence space for a RNA virus that consists of 
several thousand bases is at least as heterogeneous as the one in our toy RNA 
network, probably more so. 

In the present work, we have only considered the fate of a single invading 
quasispecies. However, while an invading quasispecies is moving towards 
fixation or extinction, another mutant, one that belongs to a quasispecies 
of even higher mean fitness, may appear. The fixation probability of the 
first invader will then be modulated by the dynamics of the second one and 
vice versa, an effect commonly referred to as "clonal interference" ( [Gerrish 
and Lenski 1998] ). Clonal interference has been reported in experiments with 
vesicular stomatitis virus ([Miralles et al. 1999^ [Miralles et al. 20001) and 
with the bacterium Escherichia coli ( |de Visser et al. 1999| ). Currently, an 
accurate mathematical description of clonal interference for the quasispecies 
case is not available. 

The approach we have followed in this work cannot directly be gener- 
alized to include clonal interference, because the assumption of a constant 
background average fitness {w) is not justified in the context of two (or more) 
competing branching processes. A second problem that we have to solve in 
a theory of quasispecies clonal interference is the identification of advanta- 
geous mutants. Throughout the present paper, we have used the definition 
that an advantageous mutant is one that can grow into a quasispecies with 
higher average fitness than that of the currently established quasispecies. In 
order to use this definition in the context of clonal interference, we need to 
have a priori knowledge about how to best subdivide the sequence space into 
independent quasispecies. Only with this knowledge can we decide whether 
a particular new mutant is part of the parent quasispecies, or rather the 
founding member of a new quasispecies. A possible way to study clonal inter- 
ference in future work will be to consider a particular fitness landscape — for 
example, a set of intertwined neutral networks at different fitness levels — for 
which the a priori separation into distinct quasispecies is possible. For such 
a landscape, numerical studies of clonal interference will be straightforward. 
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and an analytic description should be possible as well. For landscapes that 
are a priori unknown, even the numerical investigation of clonal interference 
will remain difficult until a workable method for the identification of advan- 
tageous mutants has been found. 

Recently, Jenkins et al. (|2001|) and Holmes and Moya (|2002|) expressed 
doubts regarding the relevancy of the quasispecies model for virology (but 
see Pomingo 2002| ). They argued that there is no unequivocal experimental 
evidence for the quasispecies nature of RNA viruses, and that the determin- 
istic quasispecies equations are potentially not applicable to viral evolution 
on theoretical grounds, due to the immense size of the sequence space. The 
results of the present paper show that the second concern is not entirely 
justified. A single sequence has a positive probability to rise to fixation if 
and only if the average fitness of the quasispecies that will form eventually 
exceeds the average fitness of the currently established quasispecies. The 
individual fitness of the invading sequence has some influence on the exact 
value of that probability, but does not affect whether fixation is possible 
at all. Moreover, when the population size reaches several hundred, with 
probability of almost one the population will, for reasonable choices of the 
parameters, behave as predicted by the deterministic equations. A similar 
result has been obtained by ( |van Nimwegen et al. 1999|) for flow reactor 
simulations, where on the same neutral network of RNA sequences that we 
have studied here, quasispecies effects started to become important when the 
product of population size and mutation rate N ^ exceeded the value 10 [see 
Fig. 3 of ( [van Nimwegen et al. 1999| )]. 

Wilke (|2001b|) studied the probability of fixation for RNA sequences in a 
simulated flow reactor. The measured fixation probability was compared to 
an expression equivalent to the deterministic growth estimate of the present 
work (since continuous time simulations were used to generate the data, the 
exact expressions differ from those given here). The analytic expression cor- 
rectly predicted the parameter regions for which fixation was possible. In 
particular, the mutation rate at which a slower replicator with better muta- 
tional support could successfully invade a quasispecies consisting of sequences 
with higher individual fitnesses was determined accurately. However, the ex- 
act fixation probabilities seemed to be slightly overestimated. (Within the 
statistical accuracy of the data, a definite decision on this issue could not be 
made. While the data was in agreement with the model according to a 
test, it was not in agreement according to a non-parametric test based on 
how often the data points fell above or below the predicted value.) 
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The probability of fixation of advantageous mutants is obviously of tremen- 
dous importance for disease dynamics and vaccines. For example, live vac- 
cinces of attenuated poliovirus can contain small amounts of virulent po- 
liovirus variants ( piiumakov et al. 1991| ), the reason being that attenuated 
and virulent virus variants are often separated by only one or a few muta- 
tions. In experiments, small amounts of highly virulent virus remain typically 
suppressed by the less virulent virus, but once a threshold concentration of 
the highly virulent virus variant is reached, infection occurs ( |de la Torre and 



Holland 1990| ; [Chumakov et al. 1991| ; P?eng et al. 1996]) . The apparent ex- 



istence of such a threshold may well be a result of insufficient resolution of 
the experiments. Whether the highly virulent strain will grow is determined 
by stochastic fluctuations, and as we have seen in Fig. ^, the probability of 
fixation decays quickly with shrinking initial concentration of the virulent 
strain. If such a strain in a vaccine has a 1% chance to cause infection, then 
well over a hundred replicates of the appropriate assay are necessary to ob- 
serve at least one infection with certainty. Probabilities of this magnitude or 
lower can easily be missed at low numbers of replicates, so that the virulent 
strain appears to be safely suppressed. 
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APPENDIX 



We consider a model with discrete, non-overlapping generations, and a 
constant population size A^. Under the assumption that the reproductive 
success of a sequence i is proportional to its fitness Wi, the probability that 
a randomly chosen sequence in the next generation is offspring of sequence 
i is given by ^ = Wi/{{w)N), where (w) is the average fitness in the popu- 
lation. Since there are A^ sequences in the population, the probability that 
k of them are offspring of sequence i is binomial, P{k\i) = Q^'^il - 0^"^- 
Now consider a sequence of type r in the offspring generation. For the prob- 
ability that the parent of sequence r is a particular sequence i of the previous 
generation, we find = Qrii = WiQri/ {{w)N), because only a fraction Qri 
of the total offspring of i will be of type r. Following the previous argument, 
we find for the probability that sequence i leaves kr offspring of type r: 

We can extend the above argument to sequences of two types r and s. 
The probability that sequence i leaves kr offspring sequences of type r and 
kg offspring sequences of type s is the probability that kr offspring are of 
type r, (^^'', times the probability that ks offspring are of type s, times 
the probability that the remaining offspring are either of different types, or 
have different parent sequences, (1 — C,r — C,s)^~'^''~''" , times the number of 
possible ways in which kr and kg sequences can be chosen out of the total of A^ 
sequences in the population. This latter number is a multinomial coefficient, 
N\/[kr\ks\{N — kr — ks)\]. Putting everything together, we find 

By repeating this argument for n different sequence types, and with the 
definition M^- := N^j = WiQji/{w), we arrive at Eq. (|^). 
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Figure 1: Fixation probability versus mutation rate in a neutral network of 
51,028 RNA sequences taken from (van Nimwegen et al. 1999). Solid lines 
correspond to the solution of the full equations, dashed lines correspond 
to the neutrality estimate, and the dotted line indicates the deterministic 
growth estimate, {a — 1.05, L — 18, pa — 15.7, {w) — 1 — //[I — pa/ (3-^)].) 



Figure 2: Fixation probability versus neutrality u of the invading sequence 
in a neutral network of 51,028 RNA sequences taken from (van Nimwe- 
gen et al. 1999). The dots stem from the exact numerical solution, the 
dashed line corresponds to the neutrality estimate, and the dotted line in- 
dicates the deterministic growth estimate. The inset shows the distribution 
of neutralities in the network, [p = 0.5,0" = 1.05, L = 18, pc = 15.7, 
{w) = 1-p[1-pg/{3L)].) 



Figure 3: Fixation probability vr versus size of the invading population N 
in neutral network of 51,028 RNA sequences. The fixation probability is 
averaged over 1000 independent sets of invading sequences, chosen at random. 
The error bars indicate the standard deviation. Lines are meant as a guide 
to the eye. {p = .2, L = 18, po = 15.7, (w) = 1 - p[l - pg/{SL)].) 
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